45 research outputs found
LLMs and Finetuning: Benchmarking cross-domain performance for hate speech detection
This paper compares different pre-trained and fine-tuned large language
models (LLMs) for hate speech detection. Our research underscores challenges in
LLMs' cross-domain validity and overfitting risks. Through evaluations, we
highlight the need for fine-tuned models that grasp the nuances of hate speech
through greater label heterogeneity. We conclude with a vision for the future
of hate speech detection, emphasizing cross-domain generalizability and
appropriate benchmarking practices.Comment: 9 pages, 3 figures, 4 table
It Takes Two to Negotiate: Modeling Social Exchange in Online Multiplayer Games
Online games are dynamic environments where players interact with each other,
which offers a rich setting for understanding how players negotiate their way
through the game to an ultimate victory. This work studies online player
interactions during the turn-based strategy game, Diplomacy. We annotated a
dataset of over 10,000 chat messages for different negotiation strategies and
empirically examined their importance in predicting long- and short-term game
outcomes. Although negotiation strategies can be predicted reasonably
accurately through the linguistic modeling of the chat messages, more is needed
for predicting short-term outcomes such as trustworthiness. On the other hand,
they are essential in graph-aware reinforcement learning approaches to predict
long-term outcomes, such as a player's success, based on their prior
negotiation history. We close with a discussion of the implications and impact
of our work. The dataset is available at
https://github.com/kj2013/claff-diplomacy.Comment: 28 pages, 11 figures. Accepted to CSCW '24 and forthcoming the
Proceedings of ACM HCI '2
Social Media and Electoral Predictions: A Meta-Analytic Review
Can social media data be used to make reasonably accurate estimates of electoral outcomes? We conducted a meta-analytic review to examine the predictive performance of different features of social media posts and different methods in predicting political elections: (1) content features; and (2) structural features. Across 45 published studies, we find significant variance in the quality of predictions, which on average still lag behind those in traditional survey research. More specifically, our findings that machine learning-based approaches generally outperform lexicon-based analyses, while combining structural and content features yields most accurate predictions
Understanding and Measuring Psychological Stress using Social Media
A body of literature has demonstrated that users' mental health conditions,
such as depression and anxiety, can be predicted from their social media
language. There is still a gap in the scientific understanding of how
psychological stress is expressed on social media. Stress is one of the primary
underlying causes and correlates of chronic physical illnesses and mental
health conditions. In this paper, we explore the language of psychological
stress with a dataset of 601 social media users, who answered the Perceived
Stress Scale questionnaire and also consented to share their Facebook and
Twitter data. Firstly, we find that stressed users post about exhaustion,
losing control, increased self-focus and physical pain as compared to posts
about breakfast, family-time, and travel by users who are not stressed.
Secondly, we find that Facebook language is more predictive of stress than
Twitter language. Thirdly, we demonstrate how the language based models thus
developed can be adapted and be scaled to measure county-level trends. Since
county-level language is easily available on Twitter using the Streaming API,
we explore multiple domain adaptation algorithms to adapt user-level Facebook
models to Twitter language. We find that domain-adapted and scaled social
media-based measurements of stress outperform sociodemographic variables (age,
gender, race, education, and income), against ground-truth survey-based stress
measurements, both at the user- and the county-level in the U.S. Twitter
language that scores higher in stress is also predictive of poorer health, less
access to facilities and lower socioeconomic status in counties. We conclude
with a discussion of the implications of using social media as a new tool for
monitoring stress levels of both individuals and counties.Comment: Accepted for publication in the proceedings of ICWSM 201
Literature review writing: a study of information selection from cited papers / Kokil Jaidka, Christopher Khoo and Jin-Cheon Na
This paper reports the results of a small study of how researchers select and edit research information from cited papers to include in a literature review. This is part of a bigger content analysis and linguistic analysis of literature reviews. This study aims to answer the following questions: where do authors select information from the cited papers (e.g., Abstract, Introduction, Conclusion section, etc.)? What types of information do they select (e.g., research objectives, results, etc.), and How do they transform that information (e.g., paraphrasing, cut-pasting, etc.)? In order to answer these questions, we analyzed the literature review section of 20 articles from the Journal of the American Society for Information Science & Technology, 2001-2008, to answer these questions. Referencing sentences were mapped to source papers to determine their origin. Other features of the source information were also annotated, such as the type of information selected and the types of editing changes made to it before including into the literature review. Preliminary results indicate that authors prefer to select information from the Abstract, Introduction and Conclusion sections of the cited papers. This information is transformed through cut-paste, paraphrase or higher-level semantic transformations to describe the research objective, methodology and results of the referenced study. The choices made in selecting and transforming the source information appeared to be related to the two styles of literature review finally constructed – integrative and descriptive literature reviews.
Keywords: Literature reviews; Multi-document summarization; Information science; Information extraction; Information selection
Predicting Sentence-Level Factuality of News and Bias of Media Outlets
Predicting the factuality of news reporting and bias of media outlets is
surely relevant for automated news credibility and fact-checking. While prior
work has focused on the veracity of news, we propose a fine-grained reliability
analysis of the entire media. Specifically, we study the prediction of
sentence-level factuality of news reporting and bias of media outlets, which
may explain more accurately the overall reliability of the entire source. We
first manually produced a large sentence-level dataset, titled "FactNews",
composed of 6,191 sentences expertly annotated according to factuality and
media bias definitions from AllSides. As a result, baseline models for
sentence-level factuality prediction were presented by fine-tuning BERT.
Finally, due to the severity of fake news and political polarization in Brazil,
both dataset and baseline were proposed for Portuguese. However, our approach
may be applied to any other language
Just Another Day on Twitter: A Complete 24 Hours of Twitter Data
At the end of October 2022, Elon Musk concluded his acquisition of Twitter.
In the weeks and months before that, several questions were publicly discussed
that were not only of interest to the platform's future buyers, but also of
high relevance to the Computational Social Science research community. For
example, how many active users does the platform have? What percentage of
accounts on the site are bots? And, what are the dominating topics and
sub-topical spheres on the platform? In a globally coordinated effort of 80
scholars to shed light on these questions, and to offer a dataset that will
equip other researchers to do the same, we have collected all 375 million
tweets published within a 24-hour time period starting on September 21, 2022.
To the best of our knowledge, this is the first complete 24-hour Twitter
dataset that is available for the research community. With it, the present work
aims to accomplish two goals. First, we seek to answer the aforementioned
questions and provide descriptive metrics about Twitter that can serve as
references for other researchers. Second, we create a baseline dataset for
future research that can be used to study the potential impact of the
platform's ownership change